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ABSTRACT 

Articulation of mathematics and science education is 
advocated in the official documents of several professional organizations. To 
assess the benefit of curriculum integration, a national indicator needs to 
be developed from a correlation study of performance scores between the two 
subjects. In this study, a correlation analysis was conducted at the eighth 
grade level using international databases from the Third International 
Mathematics and Science Study (TIMSS) and its repetition in 1999 (TIMSS-R) . 
The empirical results were examined over different score scales and 
statistical transformations. The correlation coefficient ranged between 0.61 
and 0.78, suggesting that around 36% to 60% of mathematics or science 
performance can be accounted for by the relationship between these two 
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An Analysis of Relationships Between Mathematics and Science Achievement in 

TIMSS and TIMSS-R 

Abstract 

Articulation of mathematics and science education is advocated in official 
documents of several professional organizations. To assess benefit of the curriculum 
integration, a national indicator needs to be developed from a correlation study of 
performance scores between the two subjects. In this study, the correlation analysis is 
conducted at the 8th grade level using international databases from the Third 
International Mathematics and Science Study (TIMSS) and its repetition in 1999 
(TIMSS-R). The empirical results are examined over different score scales and statistical 
transformations. The correlation coefficient ranges between .61 and .78, and thus, this 
study seems to conclude that around 36% - 60% of mathematics or science performance 
can be accounted for by the relationship between these two subjects. 
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An Analysis of Relationships Between Mathematics and Science Achievement in 

TIMSS and TIMSS-R 

Improvement of student achievement in mathematics and science is part of the 
Educate America Act passed by the U.S. Congress (H.R. 1804). Important documents, 
such as Professional Standards for Teaching Mathematics (National Council of Teachers 
of Mathematics [NCTM], 1991), Curriculum and Evaluation Standards for School 
Mathematics (NCTM, 1989), National Science Education Standards (National Research 
Council [NRC], 1996), and Benchmarks for Science Literacy (American Association for 
the Advancement of Science [AAAS], 1993) have been developed by professional 
organizations to strengthen articulation of mathematics and science education. All these 
national initiatives were built on an assertion that these two subjects were interrelated, 
and an integrated curriculum may help improve student performance in either subject (see 
Czemiak, Weber, Sandmann, & Ahem, 1999; Hurley, 2001; Lonning & DeFranco, 

1997). 

Guided by these national standards, many educators have been involved in 
curriculum reforms across the nation. “Although there have been numerous curriculum 
development projects aimed at the integration of science and mathematics education, 
there has been very little research to evaluate their effectiveness” (Berlin, 1989, p. 74). 
Ten years later. Miller and Davison (1999) still raised the same question, “What 
improvements in student learning should be expected from the application of an 
integrated curriculum?” (p. 29). 
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In part, the void in this area was caused by lack of assessment data to correlate 
student performances between mathematics and science. For more than three decades, 
the National Assessment of Educational Progress (NAEP) has been one of the primary 
projects to assess the condition of U.S. education. Due to nature of the NAEP design, 
mathematics and science scores were gathered from different student samples (Allen, 
Carlson, & Zelenak, 1999). Thus, no students took the NAEP science and mathematics 
tests concurrently, and no inter-disciplinary analysis can be conducted using the NAEP 
database. 

Besides the domestic projects, large-scale comparative data have been released 
from the Third International Mathematics and Science Study (TIMSS) and a repeat of the 
TIMSS project (TIMSS-R) in the late 1990s. Widely cited as an international benchmark 
in education, the TIMSS and TIMSS-R projects incorporated both mathematics and 
science tests to measure student academic performance (Martin & Mullis, 1996; Mullis, 
et al., 2000). Accordingly, an analysis of the score correlations can be conducted in this 
study to assess the relationship between mathematics and science achievements. 
Statistical indicators can be developed from the analysis of U.S. 8th grade databases from 
the TIMSS and TIMSS-R projects. To date, TIMSS/TIMSS-R reports have been largely 
divided along with subject boundaries (e.g., Beaton et al., 1996a, b; Martin et al., 1998, 
2001; Mullis, et al., 1998, 2001). A unique feature of this investigation is represented by 
a concerted effort to articulate the assessments of student achievement between the two 
core subjects. 
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Literature Review 

Because student test scores are measured on an interval scale, Pearson correlation 
coefficient is an appropriate choice to assess the linear relationship between mathematics 
and science achievements (Ott, 1993). The formula of Pearson r can be written as: 

r = cov(xi, X 2 )/sqrt[var(xi)*var(x 2 )] (1) 

Therefore, the computing of Pearson r depends the values of the variances [i.e., var(xi), 
var(x 2 )] and covariance [i.e., cov(xi, X 2 )]. 

Estimation of the variance and covariance parameters hinges on the sampling 
structure. Built on the NAEP experience, the TIMSS/TIMSS-R projects employed 
stratified/cluster sampling techniques to facilitate the data collection (Martin, Gregory, & 
Stemler, 2000; Martin & Kelly, 1997). According to Kish (1965), the assumption of 
simple random sampling tends to underestimate the variability of statistical estimates for 
stratified samples. The difference can be described by design effect (deff): 

deff=( variance from complex sampling)/(variance from simple random sampling) 
The American Institute of Research [AIR] (2003) developed and upgraded a 
special software package entitled “AM” to analyze data from complex samples. AIR 
noted, 

AM is a statistical software package for analyzing data from complex samples, 
especially large-scale assessments such as the National Assessment of 
Educational Progress (NAEP) and the Third International Mathematics and 
Science Studies (TIMSS). (http://am.air.org, p. 1) 

For the TIMSS/TIMSS-R data, it was reported that a total of five plausible 
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scores have been computed for each student in each subject area, and “one set of the 
imputed plausible scores can be considered as good as another” (Gonzalez & Smith, 

1997, ch. 6, p. 3). The interchangeability of plausible scores suggests equivalency of the 
design effect (deff) among the plausible scores. Under the invariant assumption of the 
design effect, the AM software is employed to compute correlation coefficients among 
the plausible scores in each subject. 

Research Questions 

In the TIMSS and TEMSS-R databases, TEMSS scores have been scaled twice in 
the 1990s. The old scale was built on the single-parameter Rasch model, and has been 
used in the original TEMSS reports (Beaton et al., 1996a, b). TEMSS-R employed a three- 
parameter model from the Etem Response Theory (ERT) (Martin et al., 2001; Mullis, et 
al., 2001). Thus, the TIMSS data have been re-scaled by the three-parameter ERT model 
to examine the trend between TIMSS and TEMSS-R (Martin, Gregory, & Stemler, 2000). 
For this reason, plausible scores were computed in the three data files: (1) TEMSS 
original data, (2) TEMSS-R data, and (3) TIMSS re-scaled data (Gonzalez & Smith, 1997; 
Martin, Gregory, & Stemler, 2000). To triangulate the research findings from these 
measures at the 8th grade level, correlational analyses have been conducted in this study 
to address the following questions: 

1. What are the correlation coefficients between mathematics and science achievements 
using different methods of statistical summary? 

2. Are there any differences in the TIMSS correlation coefficients between the new and 



old scales? 
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3. Are there any differences in results of the correlational analyses between TIMSS and 
TIMSS-R? 



Methods 

After obtaining the correlation coefficients between plausible scores of 
mathematics and science achievements, an average of these coefficients is needed to 
present the statistical findings. However, Fisher (1921) raised a caution against the use of 
a simple average for the r coefficients. He wrote: 

In the neighbourhood of +1, the [correlation coefficient distribution] curves 
become extremely skew, even for large samples, and change their form so rapidly 
that the ordinary statement of the «probable error» is practically valueless. It 
was accordingly suggested that the variable r was unsuitable for expressing the 
accuracy of an observed correlation in these regions but that, by a simple 
transformation, a variable might be obtained the sampling curves of which are 
practically normal and of constant standard deviation, (p. 1-2) 

Corey, Dunlap, and Burke (1998) concurred, "When correlations come from a matrix, 
there is a consistent advantage associated with using [Fisher’s] z'. Across sample size 
and numbers of correlations averaged, bias in average r(z)’ is smaller than bias in average 
r" (p. 260). 

In this study, the average of r is calculated and compared to the corresponding 
results from Fisher's (1921) z transformation to check the alternative r estimates between 
mathematics and science achievements (Question 1). In addition, findings from the new 
and old TIMSS scales are examined to assess stability of the r estimates (Question 2). 
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The TIMSS and TIMSS-R results are further analyzed to disentangle the trend of the 
correlation coefficients at the 8th grade between 1995 and 1999 (Question 3). 

Results 

Perhaps because of the fairly large sample size involved in the correlation 
computing, no substantial difference has been found in the results from the average Z 
score and the Fisher’s z transformation (see Table 1). On the other hand, significant 
differences have been found from the correlation coefficients between the old and new 
scales in TIMSS. This gap seems to justify need of a scale transformation in the TIMSS 
and TIMSS-R trend analysis (Martin et al., 2001; Mullis, et al., 2001). On the same new 
scale of mathematics and science scores, the TIMSS and TIMSS-R results seem fairly 
consistent with a relative fluctuation less than 4% of the r value. Across all these 
measures, a moderate to strong degree of relationship (.61<r<.78) has been found 
between mathematics and science achievements. 

Discussions 

The history of integrating mathematics and science instruction can be traced back 
to at least the beginning of the 20th century (see Isaacs, Wagreich, & Gartzman, 1997; 
Lehman & McDonald, 1988). Some educators believe that “The integration of science 
and mathematics education can provide real world experiences and applications which 
may encourage student involvement and facilitate the understanding of both science and 
mathematics concepts, s’lcills, and processes” (Berlin, 1989, p. 73). 

Despite the persistent encouragement for curriculum articulation by professional 
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organizations, the connection between mathematics and science may vary in specific 
subject domains. On one hand, much of physics cannot be properly covered without 
calling on mathematical concepts and skills. However, the mathematical demand is not 
as strong in biology, and “other sciences such as psychology might not yet be ready for 
the kind of mathematization that has taken place in physics” (Orton & Roper, 2000, p. 
124). 

Consequently, science and mathematics educators have to deal with an 
“unfocused definition of integration” (Czemiak, Weber, Sandmann, & Ahem, 1999, p. 
422). Huntley (1998) proposed a mathematics/science continuum on which both ends 
represent a clear separation of mathematics and science, and the center represents a 
compete integration. The TIMSS and TIMSS-R results show an average correlation 
coefficient between .61 and .78 (see Table 1). Converting to a coefficient of 
determination (r^), the results seem to suggest that an integration effort might account for 
36% - 60% of mathematics or science performance in the United States according to the 
international measurements at the 8th grade level. Thus, too much or too little emphasis 
on the curriculum integration does not seem to have the support from the existing 
database. 

More specifically, the TIMSS instrument also includes some items covering 
applications of mathematics knowledge in scientific inquiry. For instance, an item on 
proportional reasoning reads: 
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L 14. The table shows the \'alLies of .v and where .v is proportional toy. 



X 


3 


6 


p 


>' 


7 


Q 


35 



What are the values of P and Q ? 

A. P - Hand 0-31 

B. 10 and 0-14 

C. Z’- 10 and 0-31 

D. /’-Hand 0-15 

E. P- 15 and 0-14 



This type of data imputation has been employed in deduction of various models in 
physics and chemistry, such as Charles’ law of thermodynamics (Sears, 2^mansky, & 
Young, 1987). Whereas a random guessing over the five choices could have generated a 
20% correct response rate, only 24% eighth grade participants responded correctly to this 
TIMSS question (http://www.timss.org). 

In another example, students were asked to use a classical relationship among 
time, displacement, and velocity (Hagelberg, 1973). Given four options in the following 
item, the probability of obtaining a correct answer through guessing is 25%. Across all 
TIMSS participating nations, only 27% 8th graders answered this question correctly. 
Hambleton (1988) pointed out, “with difficult multiple-choice tests, a researcher might 
anticipate considerable guessing on the part of examinees. Needed, therefore, would be a 
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model that could handle this situation” (p. 154). 

Q16. How long does it take light from the nearest star other than tire Sun to 

reach Earth? 

A. Less than 1 second 

B. About 1 hour 

C. About 1 month 

D. About 4 years 

The TIMSS old scale was built on the single-parameter Rasch model, and has 
resulted in different findings than the one from the TIMSS-R three-parameter IRT model. 
In general, the Rasch model can be considered as a special case of the three-parameter 
IRT model under assumptions of equal item discrimination and no correct guessing 
among low ability examinees (Hambleton, 1988, Hambleton & Swaminathan, 1985). 
According to Lange (1997), the TIMSS instrument includes 429 multiple-choice, 43 
short-response, and 29 extended-response items. The large number of multiple-choice 
items, along with the low correct-response rate in this merging area between mathematics 
and science, seems to support the effort of TIMSS researchers to rescale the TIMSS 
results on a three-parameter IRT model that has taken the guessing effect into 
consideration. On the new IRT scale, the correlation coefficient between mathematics 
and science performances is in a range above .74 and below .78, showing a much higher 
level of consistency between the TIMSS and TIMSS-R findings (see Table 1). 

In summary, although various school initiatives have been introduced across the 
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United States to integrate mathematics and science curricula (e.g., Judson & Sawada, 
2000; Stallings & Ottinger, 1994; Woolnough, 2000), few researchers have examined 
empirical evidence to disentangle the relationship between the two subject scores. 

TIMSS and TIMSS-R data provided a unique opportunity to investigate the score 
correlation using different statistical transformations and measurement scales. Measured 
by the coefficient of determination (r^), the data analysis appears to indicate that an 
integration effort can account for 36% - 60% of mathematics or science performance at 
the 8th grade level. Joint efforts seem to be needed from mathematics and science 
educators to further improve student performance on these mathematics-science linkage 
items beyond the level of random guessing. 
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Table 1 



Correlation coefficients between mathematics and science achievements at the 8th grade 



Project 


Measures 


N 


Average r 


Fisher's r(z) 


TIMSS 


Old scale 


7087 


.61217 


.61220 




New scale 


7087 


.74936 


.74937 


TIMSS-R 


New scale 


9072 


.77766 


.77763 
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